Intro

R Quick Intro Part 1: Basic Data Manipulation

Here comes a closer look at the “Tidyverse”, a unified approach to data wrangling in R.

It replaces some of the base R routines and provides a consistent interface to often-used functions.

The documentation of the tidyverse stack of packages can be found here.

First Steps

Step 1

At first we load the tidyverse stack of packages. Not all of the total 29 packages are loaded automatically, e.g. package readxl has to be loaded manually:

library(tidyverse)
library(readxl)
library(dygraphs)

We also load the library dygraphs for chart output.

Step 2

Now we can read in an Excel file with function read_xl():

data <- read_excel("Data/sales.xls")
str(data)
tibble [7,085 x 11] (S3: tbl_df/tbl/data.frame)
 $ ID      : num [1:7085] 2004 2005 2006 2007 2008 ...
 $ Sales   : num [1:7085] 2691.9 910.5 69.2 22.6 4141.7 ...
 $ Cost    : num [1:7085] 1442 639.1 71.5 22.2 1184.6 ...
 $ Category: chr [1:7085] "Spices" "Spices" "Fruit" "Fruit" ...
 $ Product : chr [1:7085] "Saffron" "Saffron" "Plums" "Plums" ...
 $ SaleDate: POSIXct[1:7085], format: "2008-12-28" "2008-12-28" ...
 $ Quarter : chr [1:7085] "Q4" "Q4" "Q4" "Q4" ...
 $ Year    : num [1:7085] 2008 2008 2008 2008 2008 ...
 $ SalesRep: chr [1:7085] "Jessie O'Brien" "Jessie O'Brien" "Jessie O'Brien" "Jessie O'Brien" ...
 $ Region  : chr [1:7085] "Northeast" "Northeast" "Northeast" "Northeast" ...
 $ State   : chr [1:7085] "New Jersey" "New Jersey" "New Jersey" "New Jersey" ...

The results, as that of most basic tidyverse functions, is a “tibble”, an enhanced data frame.

Step 3

One of the most basic operations on data is sorting. The tidyverse function is aptly called arrange():


Have a look on different sortings.

Charts: dygraph

Lung Deaths (All)

Lung Deaths (Male)

Lung Deaths (Female)

Healthy child programme

Health protection programme

NHS Health check programme